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Abstract — In this paper, we first introduce tlie extended binary 
representation of non-binary codes, wiiich corresponds to a cover- 
ing graph of the bipartite graph associated with the non-binary 
code. Then we show that non-binary codewords correspond to 
binary codewords of the extended representation that further 
satisfy some simplex-constraint: that is, bits lying over the same 
symbol-node of the non-binary graph must form a codeword of 
a simplex code. Applied to the binary erasure channel (BEC), 
this description leads to a binary erasure decoding algorithm 
of non-binary LDPC codes, whose complexity depends linearly 
on the cardinality of the alphabet. We also give insights into 
the structure of stopping sets for non-binary LDPC codes, and 
discuss several aspects related to upper-layer FEC applications. 

I. Introduction 

Data loss recovery - for instance, for content distribution 
applications or for distributed storage systems - is widely 
addressed using erasure codes that operate at the transport/link 
or the application layer of the communication system. Source 
data packets are extended with repair packets that are used to 
recover the lost data at the receiver In this context. Maximum 
Distance Separable (MDS) codes are ideal codes, in the sense 
that decoding is possible as soon as the number of received 
packets equals the number of source data packets. However, 
for large block lengths, their decoding becomes untractable, 
and thus iteratively decoded graph-based codes constitute the 
main alternative. Binary Low-Density Parity-Check (LDPC) 
codes [1], with iterative decoding, have been proven to perform 
asymptotically close to the channel capacity [2] [3], while 
the decoding complexity per decoded bit is independent of 
the code length. Tanner represented LDPC codes by sparse 
bipartite graphs, and showed that they can be generalized 
by replacing single parity check-nodes with more general 
constraint-nodes [4]. Nowadays, these codes are referred as 
GLDPC codes and were recently investigated for the BEC [5], 
[6]. Another class of graph-codes, which have the attractive 
property of being able to generate an infinite sequence of repair 
packets, are the rateless codes proposed in [7] [8]. Over the 
past few years there also has been an increased interest in non- 
binary LDPC codes due to their enhanced correction capacity. 
They were mainly investigated for physical-layer channels, but 
at this time only few works are dealing with the BEC [9], [10], 
[11]. Despite their performance, non-binary LDPC codes still 
have to overcome the obstacle of decoding complexity in order 
to become attractive for practical systems. 

In this paper, we introduce the extended binary represen- 
tation of non-binary codes. From a graph point of view. 

This work has been partially supported by the French ANR grant N 2006 
TCOM 019 (CAPRI-FEC project) 



the extended representation corresponds to a covering graph 
of the bipartite graph representing the non-binary code. The 
covering graph represents a binary code, and we show that 
any non-binary codeword can be lifted to a binary codeword 
of the covering graph. This gives a one-to-one correspondence 
between non-binary codewords and binary codewords of the 
covering graph that are further constrained by a simplex cod^H 
(that is, bits lying over the same symbol-node of the non- 
binary graph must form a codeword of a simplex code). By 
using the extended representation, we derive a binary erasure 
decoding for the BEC, whose complexity depends linearly on 
the cardinality of the alphabet, and which recover the values 
of the erased bits from messages received from both simplex 
and parity check constraints. 

The paper is organized as follows. In section HI] we fix 
the notation used throughout the paper, and we review the 
construction of non-binary LDPC codes and their decoding 
over the BEC. The extended binary representation of non- 
binary codes is introduced in section |lll] In section |IV] we 
derive the binary erasure decoding of non-binary LDPC codes, 
and we discuss stopping sets and several aspects related to 
upper-layer FEC applications. Finally, section [V] concludes the 
paper. 

II. Non-binary LDPC codes 

We consider non-binary codes defined over an alphabet A 
with q elements, where g = 2^ is a power of 2 (the last 
condition is only assumed for practical reasons). We assume 
that A is endowed with a vector space structure over F2 (the 
field with 2 elements), and we fix once for all an isomorphism 
of vector spaces: 

^^F^ (1) 
Elements of A will also be called symbols, and we say that 



) £ F2 is the binary image of the symbol X ^ A 



if they correspond to each other by the above isomorphism. 

Let L = £^2 (-^) denote the algebra of F2-endomorphisms 
of A. By evaluating elements of L on symbols of A we get a 
left action of L on A, which will be denoted multiplicatively: 

Lx^^^: {h,X)^hX -.^hiX) (2) 

Any matrix H e MM,jv(L) defines a code C C A'^ : 

C ^ ker{H) C A^ (3) 



N 



,x 



n=l 



0, Vm — 1, 



A simplex code is the dual of a Hamming code. 



Remark 1: Codes defined over ¥q - the finite field with q 
elements - are a particular case of the above definition. The 
alphabet of these codes is = Fg, whose F2-vector space 
structure is inherited from the additive operation on ¥q. Also, 
the internal field multiplication gives an embedding of as 
a vector subspace of L Cr2{A). We say that the code C 
is defined over ¥q if C is defined as the kernel of a matrix 
H e MAf_jv(Fg) C MA./.Ar(L). In this case C is a F,-vector 
subspace of F^. 

A. The binary image of a non binary code 

A sequence of symbols {Xi^...,Xm) £ may be 

mapped into a binary sequence of length Np via the isomor- 
phism of ([Til; this binary sequence will be referred as the binary 
image of the given sequence of symbols. The binary images of 
the codewords {Xi , . . . , Xjy ) £ C form a linear binary code 
Cbin ^ F^'', called the binary image of C. The isomorphism 
of ([T]i can also be used to further identify: 

L = C^, [A] ^ Cf, {WD = Mp(F2) (4) 

Thus, by replacing each entry of H e MAf,A'(L) with its 
image under the above identification, we obtain a binary matrix 
i?bin G '^Mp,N'p{^2), which is the parity check matrix of the 
binary code Cbin- 

Remark 2: To avoid confusion, vectors will always be left- 
multiplied by a given matrix (unless the contrary is explicitly 
stated). Thus, if /i G L and ruh G Mp(F2) is its binary image, 
we have hX = Y ^ mh{xo, . ■ ■ , Xp-iY = (yo, • ■ • , 2/p-i)*, 
for all X,Y eA. 

B. Graphical representation 

The bipartite graph associated with a non-binary code C, 
denoted by Ti, consists of N symbol-nodes and M constraint- 
node^ representing respectively the N columns and the M 
rows of the matrix H. A symbol-node and a constraint- 
node are connected by an edge of Ti if the corresponding 
entry of matrix is a non-zero element of L (note that the 
corresponding entry is not assumed invertible!). Each edge of 
the graph is further labeled by the corresponding non-zero 
entry of H. We also denote by TL{n) the set of constraint- 
nodes connected to a given symbol-node n G {1,2,..., N}, 
and by TC{m) the set of symbol-nodes connected to a given 
constraint-node m G {1,2, . . . , M}. 

C. Decoding over the BEC 

In this section we assume that a non-binary LDPC code 
is used over the BEC(e) - the binary erasure channel with 
erasure probability e. Thus, the length N sequence of encoded 
symbols is mapped into its binary image of length Np, which 
is transmitted over the BEC; each bit from the binary image 
being erased with probability e. 

At the receiver part, the received bits are used to reconstruct 
the corresponding symbols of the transmitted codeword. Let n 
be a symbol-node of the Tanner graph. We say that a symbol 

^These nodes are generally called check-nodes. However, we will use 
constraint-nodes for non-binary codes, and check-nodes for binary codes. 



X E A is, eligible for the node n, if the probability of the 
transmitted symbol being X is non-zero. Tacking into 
consideration the channel output, the set of eligible symbols, 
denoted by consists of the symbols whose binary images 
fit with the received bits (if any) of the transmitted symbol. 
These sets constitute the a priori information of the decoder. 
They are iteratively updated by exchanging messages between 
symbol and constraint-nodes in the graph. Each message is a 
subset of A, representing a set of eligible symbols, either from 
the constraint-node or from the symbol-node perspective: 
> Each constraint-node m represents a linear combination 
of symbol-nodes n G Ti{m), whose coefficients are given 
by the corresponding edge labels. The constraint-node 
m is verified if this linear combination is equal to zero. 
Therefore, for each n G TL{m) we can derive a set of 
eligible symbols, denoted by £„i,ti, according to the sets 
of eligible symbols with n' G TL{m) \ {n}. 

• On the other hand, each symbol-node n is involved in 
several linear constraints given by the nodes m G 'H{n), 
all of which must be verified. Therefore, we can update 
the set £n, by tacking into account the sets of eligible 
symbols Sm,n, with m G Ti,{n). 

Using the above notation, the iterative decoding for the BEC 
can be expressed as follows (see also [11]): 
. constraint-node processing 

n'e-H(m)\{n} 

• symbol-node processing 

\men(n) J 

where h:^^]^^£,n,n '-^ {X e A \ h,n^nX G £,n,n} (recall 
that h„i^n is not assumed to be invertible). 
These two steps are iterated as long as the cardinality of any 
£n can be decreased. The decoding succeeds whenever all the 
sets of eligible symbols get cardinality 1. It can be seen 
that any set of eligible symbols, f„ or f„i,n, is a F2-affine 
subspace of A; in particular, its cardinal is a power of 2. 

Remark 3: In the above description of the erasure decoding, 
a symbol-node n send the same message f„ to all its neighbor 
constraint-nodes, violating the extrinsic information principle 
of a message-passing iterative decoding. However, the erasure 
decoding would not be changed by processing symbol-nodes 
in an extrinsic manner This is due to the specificity of the 
BEC, which either erases a bit or transmits it correctly. 

III. Extended binary representation of a 

NON-BINARY LDPC CODE 

Let Zq — {0,1, . . . ,q~l} denote the set of integers modulo 
q. The bitwise XOR operation endows Zg with a vector space 
structure over F2, and the mapping Fj that sends an 

integer into its binary decompositioiJl defines a vector space 
isomorphism. 

'We assume that the first bit of the binary decomposition is the least 
significant bit 



Let h E h and let nih E Mp(F2) be its binary image. 
By using the above isomorphism, we obtain the following 
endomorphism of Z^: 



where Srih is the transpose of the matrix m/,. Thus, $/i satisfies 
^h{i A j) = A ^h{j), where A is the bitwise XOR 

operation. The matrix Mh G Mq_i(F2) defined by: 

1, if j = 
0, otherwise 

where G Z* x Z*, is called the extended matrix 

representation of h. When h is an invertible element of L 
(or, equivalently, m/j is an invertible matrix of Mp(F2)), $/i 
induces a permutation of Z*, thus Mh is a permutation matrix. 

Remark 4: The use of Z, in the above definition is only 
intended for indexing rows and columns of Alh by integers 
rather than by symbols of A or by elements of F2. 

Example 5: Assume that p = 3, and let /i G L with binary 
image nih given by: 

/ 1 1 
m,, = 1 1 1 
\ 1 1 

The rows of rrih define respectively $/i(2), and $;j,(4). 

Thus, = 5 is the integer whose binary decomposition 

is given by the first row of mh, and similarly $;i(2) = 7 and 
$,,(4) = 6. Finally: 

. $/,(3) = $,.(l)A$ft(2) = 2 

. $;,(5) = $,.(1)A$,.(4) = 3 

. $ft.(6) = $h.(2)A$,,(4) = l 

. $h.(7) = $,,.(l)A$,,(2)A$,,(4) = 4 

Before defining the extended binary representation a non- 
binary code, let us further develop this example. Consider now 
a non-binary code defined by a single linear constraint: 

hiX + h2Y + h3,Z = 0, 

where ft,i,ft,2,^3 G L, and X,Y,Z G A Assume that after 
replacing /ii, /i2, /is, and X, Y, Z by their binary images, the 
above equation becomes (see also Remark |2|i: 




(^1 



Z2) 



^ 
= 



(5) 



- X2) + ivi + 2/2) + 
(xi + X2) + {ya + V2) + {zn + zi + Z2) = 

The main idea of the extended binary representation is to 
represent the code by a binary graph whose bit-nodes are in 
one-to-one correspondence with the set of all possible linear 
combinations of XiS, yi's, and z^'s. Therefore, we define: 

10 10 10 1 
5= ( 1 1 1 1 
1 1 1 1 



and 

(ai,a2,---,a7) = (xq, xi, 2:2) x S* 

(/3i,/?2, • . • ,/37) = (yo,yi,y2) X S* 

(7i;72, • ■ ■ ,77) = (zo,zi,Z2) y. S 

Note that S is the parity check matrix of a Hamming code, 
thus a = (cki, a2, . . . , ay), /? = /32, ■ • ■ , Z??), and 7 = 
(71, 72, • ■ • , 77) are codewords of the dual Hamming code, 
also called simplex code. The above linear equations (|5]l imply 
that: 

Mia + M2P + M37 = 0, 

where Mi, M2, and M3 are the extended matrices associated 
with hi, h2, and /13. This equality corresponds to seven binary 
parity checks that can be represented by the binary matrix 
below (the zero entries do not appear in the matrix by concern 
of legibility). The parity checks ci,C2, and C4 correspond to 
the linear equations of (|5]i, and all the other parity checks 
(c3,C5,C6, and C7) correspond to linear combinations of the 
these ones. 



ci 

C2 
C3 
C4 
C5 
C6 
C7 



1 2 3 4 5 6 7 



1 



1 



f3 

1 2 3 4 5 6 7 



1 



1 



7 

1 2 3 4 5 6 7 



1 



1 



Definition 6: The matrix Hhin of size (M(q-l), N(q-l)), 
obtained by replacing each coefficient h of H hy its extended 
binary matrix Mh, is called the extended binary matrix asso- 
ciated with H. The binary code Cbin = kcr(i/bin) is called the 
extended binary code associated with C. 

Definition 7: Let S{p) G Mp ,j_i(F2) be the binary matrix 
whose columns represent the binary decomposition of integers 
j G {1; . . . , 9—1}. The simplex code S{p) is the [q—l,p, 2^^"^] 
linear binary code with generator matrix S{p). 

Theorem 8: Let Cbin be the extended binary code associated 
with a non binary code C. 

(1) Let (Xi,..., Xn) G C, and for each n G {1, . . . , TV} let 



(an, 



n,q-l. 



G S{p) be the simplex codeword obtained 



= by encoding the binary image {xn,o, 



of Xn- Then 



(ai, 



, ai^q-i, , aN,i, ■ 



(2) The above mapping defines a vector space isomorphism: 

c^Cw„n5(p)^ 

where S{p)^ = S(j)) x • ■ • x S{p) C F^^''^^^ is the vector 
space product of N copies of S{p). 

An intuitive interpretation of the above theorem is that a 
non-binary code can be represented by a graph with N{q — 1) 
bit-nodes and M{q — 1) check-nodes connected according to 
the extended binary matrix i/bin, and N simplex-nodes con- 
nected each one to (9— 1) consecutive bit-nodes. Hence, within 
a message-passing decoding, the bit-nodes should recover their 



values from messages received from both simplex and check- 
nodes of the graph. Although we are interested in decoding 
non-binary codes over the BEC, the ideas presented in this 
paper might be extrapolated to other channels. 

Remark 9: The extended binary representation is also use- 
ful for understanding aspects related to cycles of the bipartite 
graph associated with a non-binary LDPC code. Assume that 
all the non-zero entries of H are invertible. Let Tiyn be the 
bipartite graph associated with the matrix i/bin- It follows from 
the construction that Hun is a covering graph of Ti, hence 
any cycle of Tibin lies over some cycle of H. Furthermore, 
let (ei, 62, . . . , e2e) be a cycle of length 2£ of Ti, and let hi 
denote the label of the edge e,. Then, the number and the 
length of cycles of Hun lying over (ei, 62, . . . , 62^) can be 
derived using the cycle decomposition of the permutation ^h, 
where h — hih^^ ■ ■ ■ h2i-ih^l, in a similar way as for quasi- 
cyclic codes (see for instance [12]). 

IV. Linear time erasure decoding 

Similar to section III-CI we assume that a non-binary 
LDPC code is used over the BEC(e). Let {Xi,X2, . . .,Xn) 
be the length-A^ sequence of encoded symbols, and let 

(.Ti^o, ■ • ■ , xi^p-i, XN,o, ■ • ■ , XN.p-i) denote its binary 

image of length Np, which is transmitted over the BEC; each 
of its bits being erased with probability e. At the receiver 
part, the received bits are used to provide information to 
the corresponding bit-nodes in the extended binary graph 
H-bin- More precisely, for each coded symbol X„ there are 
q — 1 corresponding bit-nodes in Tibin, which are denoted by 
{an,i, ctn.2, • ■ • , ctn.q-i). Recall that each a„.fc corresponds to 
a linear combination of Xn.o, ■ ■ ■ ,Xn,p-i, whose coefficients 
are given by the binary decomposition of k G — 1}. 

Therefore, the bit-node a„ 2*' < i < p — 1, corresponds to 
the bit Xn,i from the binary sequence that is transmitted over 
the BEC. 

The decoding algorithm is initialized as follows: 

• for each received bit Xn,i set: 

• set all the other bit-nodes a„_fc as erased 

Note that a bit-node an,k is set at erased if either k is not 
a power of 2, or fc = 2* but the corresponding bit Xn.i was 
erased by the channel. Erased bit-nodes are then iteratively 
recovered as follows: 

• simplex-node processing 

for each n G {1, . . . , N}, if bit-nodes an,ki , • • ■ , ckn.fc; 
are recovered (either received or recovered at the previous 
iterations), recover the value of an.kiA---Aki by: 

an,fciA---Afci = an,fci A ■ • • A an,ki 

• check-node processing 

for any check-node c e Hun connected to a single 
unrecovered bit-node a„_fc, recover the value of an,k as 
the XOR of the other bit-nodes connected to c. 
The simplex-node processing and the check-node processing 
are iterated as long as new bit-nodes an,k can be recovered. 



The decoding is successful if all the bit-nodes are recovered 
when it stops. 

It is important to note that the above decoding is equivalent 
to the non-binary decoding presented in section IlLCl There is 
a one-to-one correspondence between recovered bit-nodes and 
sets of eligible symbols, which can be described as follows: 

• Let Rn be the set of all recovered bit-nodes an.k after the 
simplex-node processing step, for some n G {1, . . . , N}. 
Each an,k S Rn gives the value of some linear combi- 
nation of bits Xn,o, ■ ■ ■ , Xn,p-i, that is: 

p-i 

^ ^ kiXyi^i — C^n^kj 
i=0 

where (fco, . . . , kp-i) is the binary decomposition of fc. 
Let £n C A he the subset of all symbols whose binary 
images verify the above equation for all an.k G Rn- Then 
£n coincides with the affine subspace of eligible symbols 
defined in section [iLCl The fact that is affine follows 
from the fact that for any an^k,Oin,i G Rn, we also have 

OLn,k/\l € Rn- 

• Let m be a constraint-node of the non-binary graph Ti. and 
let ci, . . . , Cq-i be the corresponding parity check-nodes 
in the binary graph TYbin- Let n e H{m), and denote 
by Rm.n the set of all bit-nodes ^ that are recovered 
by the check-nodes ci, . . . , Cq-i from the unerased nodes 
among the bit-nodes an'.k', with n' G TL{m) \ {n}. By 
using the same arguments as above, Rm.n defines a subset 
£m,n C A, which coincides with the affine subspace of 
ehgible symbols defined in section III-CI The fact that 
£m,n is affine follows from the fact that whenever q;„ & 
and an,i are recovered by check-nodes q and Cj, then 
oin,kf\i is also recovered by the check-node CiAj- 

We discuss now the complexity of the proposed erasure 
decoding. The processing of each check-node is done in con- 
stant time. Since the number of check-nodes in Tibin depends 
linearly on q, it follows that the check-node processing step 
of the decoding algorithm is done in linear time. Moreover, 
the simplex-node processing can also be implemented in linear 
time. Fix some n e {!,..., A^}, and let Rm and _Rout denote 
the sets of recovered bit-nodes an.k before and after the 
simplex-node processing. Then i?out is the "affine subspace" 
spanned by i?in, in the sense that i?out 3 ^in and an,kAi & Rout 
for any a„ fc, a„ ^ G ^out. and it can be computed as follows: 

^^out = {} 

while i?in is not empty 

an,k ^ i?in.pOp() 
-^tmp — RquX 
for an,l e i?tmp 

"n./cAi = ttn.fc A anj 

Rout = Rout U {a„,fcAi} 

^in = ^in \ {ari,fcAi} 

end 

^out = ^oiit U {an,k} 

end 



It can be easily seen that the above implementation requires 
1 + 2^ + • ■ • + 2l^inl-i <2P-l = q- l computations, 
where |i?in| denotes the dimension of the vector subspace of 
Zq spanned by {k \ an,k G -Rin}- 

The above discussion is resumed by the following: 

Theorem 10: The complexity of the extended binary era- 
sure decoding of non-binary LDPC codes depends linearly on 
the size of the alphabet. 

Before concluding the paper, we would like to emphasis 
some other advantages of the extended binary decoding. These 
aspects will be developed in future works. 

1) Stopping sets. Similar to binary LDPC codes, we can 
define stoping sets, corresponding to erasure patterns from 
which the decoding cannot recover. Thus, a stopping set is 
a subset of the set of bit-nodes of Hbin, such that: 

• if C(n,kA.i e y then either a„,fc G =^ or an,i G -V 

• check-nodes that are neighbors of are connected to 
at least twice. 

Hence, the finite length analysis of non-binary LDPC codes 
over the BEC can be derived by using techniques similar to 
those developed in [13]. 

2) UL-FEC applications. In practical systems, data packets 
received at the upper-layers encounter erasures, and erasure 
codes are used to recover the erased data packets. If non-binary 
LDPC codes are used in such situations, the coded symbols 
must be transverse to data packets: that is, the p bits of a 
symbol must belong to p different data packets (otherwise if, 
for instance, all the p bits of a symbol belong to the same data 
packet, the coded symbols will be either completely received 
or completely erased, and the non-binary code would operate 
as a binary codefl The ability of the decoding algorithm of 
dealing with data packets instead of dealing with bits is an 
attractive feature of an erasure code. The proposed extended 
binary decoding is well-suited for UL-FEC applications as it 
can easily deal with data packets: the bit-nodes a„.fc would 
correspond to packets instead of a single bit, but the decoding 
would work the same way, simply by performing bitwise XOR 
of packets a„,fc. 

3) Flexibility and small coding rates. Another interesting 
feature of the proposed decoding is the possibility of using 
incremental redundancy in order to cope with severe channel 
conditions. This can be done by transmitting all the N{q — 
1) values of the bit-nodes an.k over the channel, instead of 
transmitting only the Np bits Xn,i of the binary image. This 
is illustrated in Figure [T] We use an irregular LDPC code over 
Fig, with rate r = 1/2. In case that all the N{q — 1) values of 

the bit-nodes a„ ^ are transmitted over the channel, the coding 

p 

rate is decreased to r' — r = 2/15. As it can be seen, 

q-l 

in both situations, the code operates very close to the channel 
capacity. For large values of q, the incremental redundancy 
turns the code into an almost rateless code. 

*This is contrasting with other non-binary UL-FEC codes, as the Reed- 
Solomon codes, for which the p bits of a symbol must belong to the same 
data packet. 
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Fig. 1. Incremental redundancy using non-binary LDPC codes 



V. Conclusions 

We showed that non-binary LDPC codes can be described in 
terms of binary parity-check and simplex constraints. On the 
one hand, this description can be used for decoding non-binary 
LDPC codes, and the proposed decoding presents several 
attractive properties for practical applications: low complexity, 
capability of dealing with data packets for UL-FEC applica- 
tions, on-the-fly decoding, incremental redundancy, and small 
coding rates. On the other hand, the proposed description 
gives insights into the structure of non-binary codes, and is 
very likely that it might be used for both finite length and 
asymptotical analysis of non-binary LDPC codes. 
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